翻訳と辞書
Words near each other
・ Couple Days Off
・ Couple in Spirit
・ Couple interview
・ County Wexford (1914–23)
・ County Wexford (UK Parliament constituency)
・ County Wicklow
・ County Wildlife Site
・ County Yard
・ County-class cruiser
・ County-class destroyer
・ County-class offshore patrol vessel
・ County-controlled city
・ County-level city
・ Countyline, Oklahoma
・ CountyWatch
Count–min sketch
・ Coup (album)
・ Coup (bridge)
・ Coup (disambiguation)
・ Coup by Memorandum
・ Coup contrecoup injury
・ Coup d'Etat (comics)
・ Coup d'Etat (film)
・ Coup d'Etat (G-Dragon album)
・ Coup d'Etat (Muslimgauze album)
・ Coup d'Etat (Plasmatics album)
・ Coup d'Etat + One of a Kind & Heartbreaker
・ Coup d'etat Brooklyn
・ Coup d'état
・ Coup d'état (disambiguation)


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Count–min sketch : ウィキペディア英語版
Count–min sketch
In computing, the count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count–min sketch was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan and described by them in a 2005 paper.
Count–min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets. Spectral Bloom filters with multi-set policy are similar in concept to the count–min sketch.
== Data structure ==
The goal of the basic version of the count–min sketch is to consume a stream of events, one at a time, and count the frequency of the different types of events in the stream. At any time, the sketch can be queried for the frequency of a particular event type ( for some ), and will return an estimate of this frequency that is within a certain distance of the true frequency, with a certain probability.
The actual sketch data structure is a two-dimensional array of columns and rows. The parameters and are fixed when the sketch is created, and determine the time and space needs and the probability of error when the sketch is queried for a frequency or inner product. Associated with each of the rows is a separate hash function; the hash functions must be pairwise independent. The parameters and can be chosen by setting and , where the error in answering a query is within a factor of with probability .
When a new event of type arrives we update as follows: for each row of the table, apply the corresponding hash function to obtain a column index . Then increment the value in row , column by one.
Several types of queries are possible on the stream.
* The simplest is the ''point query'', which asks for the count of an event type . The estimated count is given by the least value in the table for , namely \hat a_i=\min_j \mathrm(h_j(i) ), where \mathrm is the table.
This estimate has the guarantee that \hat a_i \leq a_i + \epsilon \sum_j^ |a_j| with probability 1-\delta, where is the true frequency with which occurred in the stream.
* An ''inner product query'' asks for the inner product between the histograms represented by two count–min sketches, \mathrm_a and \mathrm_b.
Small modifications to the data structure can be used to sketch other different stream statistics.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Count–min sketch」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.